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ABSTRACT 

In recent years, the interest in computational intelligence techniques, which currently includes 
neural networks, fuzzy systems, and evolutionary programming, has grown significantly and a number 
of their applications have been developed in the government and industry. In future, an essential 
element in these systems will be fuzzy systems that can learn from experience by using neural networks 
in refining their performances. 

The GARIC architecture, introduced earlier, is an example of a fuzzy reinforcement learning 
system which has been applied in several control domains such as cart-pole balancing, simulation of the 
Space Shuttle orbital operations, and tether control. A number of examples from GARIC's applications 
in these domains will be demonstrated. For more details on the following notes see Refs. 5, 1, 3, 4, 7, 
6 and 2. 
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POTENTIALS FOR USE IN NEW MILLENNIUM 
AND ACCESS TO SPACE PROJECTS 


Demo flights 

- Rapid development of control systems using 
approximate control rules 

- Automatic refinement of these control systems 
Rendezvous and docking of spacecrafts 
Landing of asteroids 

Rovers to survey planet's surface 
Miniaturization by using smart sensors 



EVOLUTION OF FUZZY SYSTEMS 


Stage I: 

- Introducing the fuzzy sets idea 
Stage II: 

- Demonstrating applications 

- Dominated by engineering (control) 
Stage III: 

- Learning fuzzy systems 



MOTIVATIONS FOR USING FUZZY LOGIC AND 
NEURAL NETWORKS IN CONTROL 


Human expert controllers perform well using 
approximate reasoning 

An analytical model may not always be 
available. 

If a physical system learns to control itself, then 

it is intelligent. 

Fuzzy logic and neural networks facilitate 
interpolation which removes many of the 
restrictions of symbolic systems. 



FUZZY SYSTEMS AS UNIVERSAL APPROXIMATORS 


• Fuzzy rules as patches for approximating a function. 

• A fuzzy system can approximate any real continuous 
function to any degree of accuracy 



X 
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LEARNING METHODS 


Supervised learning 
Reinforcement learning 
Unsupervised learning 

In supervised learning, a teacher provides the 
desired control objective at each time step to the 
learning system. 

In reinforcement learning, the teacher's 
response is not as direct, immediate, and 
informative as in supervised learning and it 
serves more to evaluate the state of the system. 

The presence of a teacher or a supervisor to 
provide the correct control response is not 
assumed in unsupervised learning. 



LEARNING METHODS (Coni.) 


If supervised learning can be used in control 
(e.g., when the input-output training data is 
available), it has been shown that it is more 
efficient than reinforcement learning. 

Many control problems require selecting control 
actions whose consequences emerge over 
uncertain periods for which input-output training 
data are not readily available. In such domains, 
reinforcement learning techniques are more 
appropriate than supervised learning. 



REINFORCEMENT LEARNING 


Assumes no supervisor to critically judge the 
chosen control action at each time step. 

The learning system is told indirectly about the 
effect of its chosen control action. 

Previous works: Samuel's checkers-playing 
program, Michie and Chambers BOXES system, 
Barto, Sutton, and Anderson's AHC algorithm. 

Reinforcement learning has its roots in studies of 
animal learning, and learning automata research 
in control engineering. 

Construct an internal evaluator or a critic capable 
of evaluating the dynamic system's performance. 



GENERALIZED APPROXIMATE REASONING-BASED 
INTELLIGENT CONTROL (GARIC) 

(H. Berenji and P. Khedhar) 
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THE GARIC ARCHITECTURE 


The Action Selection Network maps a state 
vector into a recommended action F , using fuzzy 
inference. 

The Action Evaluation Network maps a state 
vector and a failure signal into a scalar score 
which indicates state goodness. This is also 

used to produce internal reinforcement ? . 

The Stochastic Action Modifier uses both F and 
? to produce an action F' which is applied to the 
plant. 



THE ACTION SELECTION NETWORK 


Layer 1 : the input layer, consisting of the real- 
valued input variables. 

Layer 2: nodes represent possible values of 
linguistic variables in Layer 1 . 

Layer 3: conjunction of all the antecedent 
conditions in a rule using softmin operation. 

Layer 4: a node corresponds to a consequent 
label with an output. 

Layer 5: nodes as output action variables where 
the inputs come from Layer 3 and Layer 4. 



THE ACTION EVALUATION NETWORK (Cont.) 


The output unit of the evaluation network: 

X)[t,t + 1 ] = Z 6, [f ] x, [t + 1 ] +Z c i [t] y t [ t,t + 1 ] 

where x> is the prediction of reinforcement. 
Evaluation of the recommended action: 



0 start; 

r\t + l] -a)[f,/] failure; 

r[t + 1 ] + y + 1 ] - u[r,r] else 


where 0 < y < is the discount rate. 



THE ACTION EVALUATION NETWORK (Cont.) 


The input is the state of the plant and the output 
is an evaluation of the state (a score), denoted by 

v. 

The u-value is suitably discounted and combined 
with the external failure signal to produce internal 
reinforcement ?. 

The output of the units in the hidden layer is: 
yJr,r+l] = *(£aJf]jCj[/+l]j 

where 

*(*) = T+^F 7 

and t and / + 1 are successive time steps. 



THE ACTION EVALUATION NETWORK 


The AEN plays the role of an adaptive critic 
element and constantly predicts reinforcements 
associated with different input states. 

The only information received by the AEN is the 
state of the physical system in terms of its state 
variables and whether or not a failure has 
occurred. 

The AEN is a standard two-layer feed forward 
net with sigmoids everywhere except in the 
output layer. 



GARIC APPLIED TO CART-POLE BALANCING 
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LEARNING IN ASN 


• We use the following learning rule 

a n _ ^ dv _ r, dvdF 

• We assume that dv/dF can be computed by the 
instantaneous difference ratio 


dv dv *>(*) -■»(*-!) 
~5F dF F(r)-F(f-l) 
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STOCHASTIC ACTION MODIFIER 


Uses 7 from the previous time step and the 
action F recommended by the ASN to generate 
a Gaussian random action F' with mean F and 

standard deviation o|f(r - l)j . o( ) is a non- 
negative and monotone decreasing function 
such as exp(- r) . F' is applied to the plant. 

Stochastic perturbation leads to a better 
exploration of state space and better 
generalization ability: 


s 



[VI - [[} 


The magnitude of the deviation I F'-F is large 

when 7 is low, and small when the internal 
reinforcement is high. 



RULE STRENGTH CALCULATION USING 
SOFTMIN OPERATOR 


Using the softmin, the strength of Rule 1 is: 

g- ]{ x o) 


Similarly, we can find w 2 for Rule 2. 

The control output of Rule 1 : 

Z| =Hc 1 '(w ; i) 

and for Rule 2: 

^2 = 


Using a weighted averaging approach, ^ and 
are combined to produce the combined result 



CONCLUSIONS 


With the GARIC architecture, we have proposed 
a new way of designing and tuning a fuzzy logic 
controller. 

The process control knowledge can now be 
modeled using approximate linguistic terms and 
later refined through the process of learning from 
experience. 

GARIC combines the qualitative knowledge of 
human experts in terms of symbolic rules and 
learning strength of the artificial neural networks. 

The GARIC architecture is general enough for 
use in other rule-based systems which perform 
fuzzy logic inference. 



FUZZY RULES FOR GUIDING 
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CONCLUSION 


(Knowledge + performance-driven learning 



for both action evaluation and selection 



• Easy to build in a priori knowledge 

• Easy to tune approximate knowledge 

• Generalizable to arbitrary characterization of 
state space 

• Hierarchical techniques of knowledge structuring 
may be useful 

• Integrated/uniform structure and algorithms for 
both ASN and AEN 
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CLUSTERING IN PRODUCT SPACE 


The current set of N neurons collectively vote to 
determine the net's prediction. 

The learning rule says that the update is 
proportional to the influence of neuron j, the 
(signed) error generated, and the corresponding 
input. 



CLUSTERING IN PRODUCT SPACE 
(H. R. Berenji, P. S. Khedhar) 


• Generate initial set of fuzzy rules from raw data. 

• Using radial basis functions and an extended 
clustering approach. 

• / : 9T -> where n and m are the input and 
output dimensions. 

• A neuron represents a fuzzy rule r : 


If 5 j is £ rl and s 2 is .. then y rl is 

C rlO C r u<S 1 + + Crln^n y r2 iS 

C r20 ■*" C r2l S l + + C r2n S n 
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SPACE SHUTTLE ATTITUDE CONTROL 


The controller is expected to perform four basic 

operations: 

1 . Attitude hold or maintaining the desired 
attitude within a small region of the desired 
value, typically known as a deadband. 

2. Attitude maneuver or going from one attitude 
to another. 

3. Rate hold or maintaining a desired rate on a 
given axis. 

4. Rate maneuver or going from one rate value 
to another rate value for a given axis. 



Its on-orbit controller or Digital AutoPilot is based 
on modern digital control theory and is a highly 
optimized controller. 

It uses two types of thrusters (two levels of jet 
thrusts), known as primary and vernier, and 
operates with two different sets of deadband 
values. 

It can perform rate maneuvers in pulse as well as 
discrete modes. Typical perturbations acting on 
the system include gravity gradient, aerodynamic 
torques, and translational burns. 



FUZZY CONTROL RULES (JET FIRING COMMANDS) 
FOR ATTITUDE CONTROL 
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The structure of GARIC for the Space Shuttle 
consists of the following: 

- In ASN, there are two inputs, error and error 
rate, each using seven labels, 31 rules with 
conclusions using five labels, and a single 
output. Hence, the network has 2, 14, 31 , 5, 1 
neurons in its five layers. 

- In AEN, there are two inputs, error and error 
rates, and a biased unit, 31 hidden layer 
nodes, and a single output. Hence, the 
network has 3, 31 , 1 neurons in its three 
layers. 



For each rule, seven labels (NB, NM, NS, ZE, 
PS, PM, PB) are used for angle error and angle 
error rate and five labels (NM, NS, ZE, PS, PM) 
are used for jet firing commands. 

In a learning experiment, a failure occurs when 
the value of a state variable goes beyond the 
allowed deadband. 

Every time a failure occurs in a GARIC 
execution, the control is shifted to a supervisory 
control routine to bring the state of the system 
back to within the deadband. 



RESULTS 


With a small number of trials (less than ten), 
GARIC learns to hold the error within a ± .4 
deadband. 

A similar experiment was performed to train this 
newer controller to hold the error within a ± .3 
deadband. 

This time five trials were needed for GARIC to 
learn this new task. 

Although an adaptive behavior has been added, 
the fuel consumption for the 100,000 time step 
simulation runs was about 222 lb. 



CONCLUSIONS 


Fuzzy Logic and Neural Networks can be 
combined and used for intelligent control. 

Neural networks can provide adaptive 
performance for fuzzy logic controllers. 

Fuzzy logic can provide knowledge 
representation capabilities for neural networks. 



TETHER CONTROL USING GARIC 


Tether control consists of three main operations: 

- Deployment Phase: 

* Conducting tether is used to deploy a 
payload. 

* e.g., Italian satellite weighing 525 kg, 
deployed to a distance of 20 km. 

- On-station Phase: 

* Acquire scientific and operational data. 

- Retrieval Phase: 

* Retrieve up to 2.4 km. 

* Dampen oscillations. 

* Completely retrieve the payload for reuse. 



COMPLEXITIES 


In vacuum, zero-g, and under gravitational and 
magnetic forces. 

Time varying dynamics of a long, flexible, 
variable length tether, the orbiter and the 
payload. 

Unlike tether length and tether tension, 
oscillation cannot be directly measured or 
controlled. 



LONGITUDINAL AND LIB RATIONAL OSCILLATION 
IN A TETHERED PAYLOAD SYSTEM 


Longitudinal and Librational Oscillation 
in a Tethered payload system 



Libration Mode (Nose to Tail) Nobbing Mode (Inplane) Satellite Pendulous Mode (Port to Starboard) 

Fig 1 Longitudinal and Librational Oscillations in a Tethered payload system 
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APPLYING GARIC FOR TETHER CONTROL 


ASN 

- Inputs: Length error and length rate error, 
each uses seven labels. 

- Output: Change in voltage to be applied to 
motors. Seven labels for conclusion. 

- Number of rules: 49 

- Network architecture: Network has 2, 14, 49, 
7, 1 neurons in its five layers. 

AEN 

- Inputs: Length error, length rate error and bias 
unit. 

- Output: Single output. 

- Number of hidden neurons: 49 

- Network architecture: Network has 3, 49, 1 
neurons in its three layers. 



APPLYING GARIC FOR TETHER CONTROL 


Failure: Length error greater than two degrees. 


On failure, use a supervisory controller to bring 
error within the specified limits. 



TETHER CONTROL RULES 


LENGTH RATE 
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Figure 3 : Fuzzy control rules for Tether control. 




RESULTS 


Learning experiments were performed during 
deployment phase (i.e., 16200 secs). 


GARIC learned to maintain the deadband in a 
small number of trials (less than five). 



FQ-LEARNING: A REINFORCEMENT LEARNING APPROACH 
TO FUZZY DYNAMIC PROGRAMMING 
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CONCLUSION 


Fuzzy Logic as the base for soft computing 

Fuzzy Logic as a powerful tool for knowledge 
representation in computational intelligence 

The key for computational intelligence 


FUZZY SYSTEMS THAT CAN LEARN!!!!!! 
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